Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make gensim optional #3493

Merged
merged 14 commits into from
Jul 19, 2024
Merged

make gensim optional #3493

merged 14 commits into from
Jul 19, 2024

Conversation

helpmefindaname
Copy link
Collaborator

@helpmefindaname helpmefindaname commented Jul 7, 2024

Closes #3482

This requires all public models to be in serialized format, so they can be loaded without attempt to load gensim.

Now you can use WordEmbeddings, BytePairEmbeddings in inference without having gensim/bpe installed.

FasttextEmbeddings and MuseEmbeddings will only work when gensim is installed, I find this justifyable, as those embeddings are not commonly used anymore.

When instanciating new WordEmbeddings or BytePairEmbeddings, gensim/bpe is required. You can install them with pip install flair[word-embeddings] after the next release or pip install -e .[word-embeddings] when developing.

Copy link
Collaborator

@alanakbik alanakbik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this @helpmefindaname. This generally looks good.

I tested loading our standard 'ner' model that I re-serialized and pushed to the hub as "alanakbik/ner-new".

I then tested loading with the following code:

from flair.data import Sentence
from flair.models import SequenceTagger

model = SequenceTagger.load("alanakbik/ner-new")

sentence = Sentence("Bill was born in New York")

model.predict(sentence)

print(sentence)

That works, but only for newer flair versions. It starts breaking down from Flair 0.10.0.

To reproduce, in a fresh env, do:

pip install flair==0.10.0

and run the above code.

This throws the error:

AttributeError: 'dict' object has no attribute 'embedding_length'

I think this actually has nothing directly to do with this PR, but affects all new models that were trained with newer Flair versions and are loaded with older ones.

But to deploy this PR, we'd need to update all models. So if people are still using an old version of Flair, the regular 'ner' model would no longer work.

@helpmefindaname can you take a look if backward compatibility can be improved?

@alanakbik
Copy link
Collaborator

@helpmefindaname thanks a lot for adding this! I will now update all models on HF.

@alanakbik alanakbik merged commit 9c4e1d2 into master Jul 19, 2024
1 check passed
@alanakbik alanakbik deleted the optional_gensim branch July 19, 2024 14:08
alanakbik added a commit that referenced this pull request Jul 23, 2024
@itsmedonttell
Copy link

The pip command recommended by the gensim import error message didn't work out of the box for me. I use Terminal and zsh on macOS. I had to put quotes around "flair[word-embeddings]" when using pip install (like so: pip install "flair[word-embeddings]"), or else zsh throws an error, and the install doesn't execute. Alternatively, pip install genism works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: upgrade urllib3
3 participants